{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-pVhOfzLx9us"
      },
      "source": [
        "#Waymo Open Dataset 2D Panoramic Video Panoptic Segmentation Tutorial\n",
        "\n",
        "- Website: https://waymo.com/open\n",
        "- GitHub: https://github.com/waymo-research/waymo-open-dataset\n",
        "\n",
        "This tutorial demonstrates how to decode and interpret the 2D panoramic video panoptic segmentation labels. Visit the [Waymo Open Dataset Website](https://waymo.com/open) to download the full dataset.\n",
        "\n",
        "## Dataset\n",
        "This dataset contains panoptic segmentation labels for a subset of the Open\n",
        "Dataset camera images. In addition, we provide associations for instances between different camera images and over time, allowing for panoramic video panoptic segmentation.\n",
        "\n",
        "For the training set, we provide tracked sequences of 5 temporal frames, spaced at t=[0ms, 400ms, 600ms, 800ms, 1200ms]. For each labeled time step, we  label all 5 cameras around the Waymo vehicle, resulting in a total of 25 labeled images per sequence. This allows for tracking over a variety of different time frames and viewpoints.\n",
        "\n",
        "For the validation set, we label entire run segments at 5Hz (every other image), resulting in sequences of 100 temporal frames over 5 cameras (500 labels per sequence).\n",
        "\n",
        "## Instructions\n",
        "This colab will demonstrate how to read the labels, and to extract panoptic labels with consistent instance ID tracks for any number of frames.\n",
        "\n",
        "To use, open this notebook in [Colab](https://colab.research.google.com).\n",
        "\n",
        "Uncheck the box \"Reset all runtimes before running\" if you run this colab directly from the remote kernel. Alternatively, you can make a copy before trying to run it by following \"File \u003e Save copy in Drive ...\".\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "1sPLur9kMaLh"
      },
      "source": [
        "# Package Installation"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "iEsf_G5_MeS-"
      },
      "source": [
        "Package installation\n",
        "Please follow the instructions in [tutorial.ipynb](https://github.com/waymo-research/waymo-open-dataset/blob/master/tutorial/tutorial.ipynb)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "rqs8_62VNc4T"
      },
      "source": [
        "# Imports and global definitions"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "YuNAlbQpNkLa"
      },
      "outputs": [],
      "source": [
        "# Data location. Please edit.\n",
        "\n",
        "# A tfrecord containing tf.Example protos as downloaded from the Waymo dataset\n",
        "# webpage.\n",
        "\n",
        "# Replace this path with your own tfrecords.\n",
        "FILE_NAME = '/content/waymo-od/tutorial/.../tfexample.tfrecord'"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "xCDNLdp9Ni8a"
      },
      "outputs": [],
      "source": [
        "import immutabledict\n",
        "import matplotlib.pyplot as plt\n",
        "import tensorflow as tf\n",
        "import numpy as np\n",
        "\n",
        "if not tf.executing_eagerly():\n",
        "  tf.compat.v1.enable_eager_execution()\n",
        "\n",
        "from waymo_open_dataset import dataset_pb2 as open_dataset\n",
        "from waymo_open_dataset.utils import camera_segmentation_utils"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ibor0U9XBlX6"
      },
      "source": [
        "# Read 2D panoptic segmentation labels from Frame proto\n",
        "Note that only a subset of the frames have 2D panoptic labels."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "O41R3lljM9Ym"
      },
      "outputs": [],
      "source": [
        "dataset = tf.data.TFRecordDataset(FILE_NAME, compression_type='')\n",
        "frames_with_seg = []\n",
        "sequence_id = None\n",
        "for data in dataset:\n",
        "  frame = open_dataset.Frame()\n",
        "  frame.ParseFromString(bytearray(data.numpy()))\n",
        "  # Save frames which contain CameraSegmentationLabel messages. We assume that\n",
        "  # if the first image has segmentation labels, all images in this frame will.\n",
        "  if frame.images[0].camera_segmentation_label.panoptic_label:\n",
        "    frames_with_seg.append(frame)\n",
        "    if sequence_id is None:\n",
        "      sequence_id = frame.images[0].camera_segmentation_label.sequence_id\n",
        "    # Collect 3 frames for this demo. However, any number can be used in practice.\n",
        "    if frame.images[0].camera_segmentation_label.sequence_id != sequence_id or len(frames_with_seg) \u003e 2:\n",
        "      break"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "wHK95_JBUXUx"
      },
      "outputs": [],
      "source": [
        "# Organize the segmentation labels in order from left to right for viz later.\n",
        "camera_left_to_right_order = [open_dataset.CameraName.SIDE_LEFT,\n",
        "                              open_dataset.CameraName.FRONT_LEFT,\n",
        "                              open_dataset.CameraName.FRONT,\n",
        "                              open_dataset.CameraName.FRONT_RIGHT,\n",
        "                              open_dataset.CameraName.SIDE_RIGHT]\n",
        "segmentation_protos_ordered = []\n",
        "for frame in frames_with_seg:\n",
        "  segmentation_proto_dict = {image.name : image.camera_segmentation_label for image in frame.images}\n",
        "  segmentation_protos_ordered.append([segmentation_proto_dict[name] for name in camera_left_to_right_order])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "wcDCYUF8Y1pY"
      },
      "source": [
        "# Read a single panoptic label"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "zYtrSEkXgpH8"
      },
      "outputs": [],
      "source": [
        "# Decode a single panoptic label.\n",
        "panoptic_label_front = camera_segmentation_utils.decode_single_panoptic_label_from_proto(\n",
        "    segmentation_protos_ordered[0][open_dataset.CameraName.FRONT]\n",
        ")\n",
        "\n",
        "# Separate the panoptic label into semantic and instance labels.\n",
        "semantic_label_front, instance_label_front = camera_segmentation_utils.decode_semantic_and_instance_from_panoptic(\n",
        "    panoptic_label_front,\n",
        "    segmentation_protos_ordered[0][open_dataset.CameraName.FRONT].panoptic_label_divisor\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6Cm8lagZY3ip"
      },
      "source": [
        "# Read panoptic labels with consistent instance IDs over cameras and time"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "0IEvJgL9gMdR"
      },
      "outputs": [],
      "source": [
        "# The dataset provides tracking for instances between cameras and over time.\n",
        "# By setting remap_values=True, this function will remap the instance IDs in\n",
        "# each image so that instances for the same object will have the same ID between\n",
        "# different cameras and over time.\n",
        "segmentation_protos_flat = sum(segmentation_protos_ordered, [])\n",
        "panoptic_labels, is_tracked_masks, panoptic_label_divisor = camera_segmentation_utils.decode_multi_frame_panoptic_labels_from_protos(\n",
        "    segmentation_protos_flat, remap_values=True\n",
        ")\n",
        "\n",
        "# We can further separate the semantic and instance labels from the panoptic\n",
        "# labels.\n",
        "NUM_CAMERA_FRAMES = 5\n",
        "semantic_labels_multiframe = []\n",
        "instance_labels_multiframe = []\n",
        "for i in range(0, len(segmentation_protos_flat), NUM_CAMERA_FRAMES):\n",
        "  semantic_labels = []\n",
        "  instance_labels = []\n",
        "  for j in range(NUM_CAMERA_FRAMES):\n",
        "    semantic_label, instance_label = camera_segmentation_utils.decode_semantic_and_instance_labels_from_panoptic_label(\n",
        "      panoptic_labels[i + j], panoptic_label_divisor)\n",
        "    semantic_labels.append(semantic_label)\n",
        "    instance_labels.append(instance_label)\n",
        "  semantic_labels_multiframe.append(semantic_labels)\n",
        "  instance_labels_multiframe.append(instance_labels)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "c6gpEl0if3nu"
      },
      "source": [
        "# Visualize the panoptic segmentation labels"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Hj5lXAlJXjKk"
      },
      "outputs": [],
      "source": [
        "def _pad_to_common_shape(label):\n",
        "  return np.pad(label, [[1280 - label.shape[0], 0], [0, 0], [0, 0]])\n",
        "\n",
        "# Pad labels to a common size so that they can be concatenated.\n",
        "instance_labels = [[_pad_to_common_shape(label) for label in instance_labels] for instance_labels in instance_labels_multiframe]\n",
        "semantic_labels = [[_pad_to_common_shape(label) for label in semantic_labels] for semantic_labels in semantic_labels_multiframe]\n",
        "instance_labels = [np.concatenate(label, axis=1) for label in instance_labels]\n",
        "semantic_labels = [np.concatenate(label, axis=1) for label in semantic_labels]\n",
        "\n",
        "instance_label_concat = np.concatenate(instance_labels, axis=0)\n",
        "semantic_label_concat = np.concatenate(semantic_labels, axis=0)\n",
        "panoptic_label_rgb = camera_segmentation_utils.panoptic_label_to_rgb(\n",
        "    semantic_label_concat, instance_label_concat)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "fgCDPt9zeV_k"
      },
      "outputs": [],
      "source": [
        "plt.figure(figsize=(64, 60))\n",
        "plt.imshow(panoptic_label_rgb)\n",
        "plt.grid(False)\n",
        "plt.axis('off')\n",
        "plt.show()"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [],
      "last_runtime": {
        "build_target": "",
        "kind": "local"
      },
      "name": "Waymo Open Dataset 2D Panoramic Video Panoptic Segmentation Tutorial.ipynb",
      "private_outputs": true,
      "provenance": [
        {
          "file_id": "tutorial_2d_pvps.ipynb",
          "timestamp": 1649874845881
        },
        {
          "file_id": "tutorial.ipynb",
          "timestamp": 1644287712198
        }
      ],
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
